Skip to content

Conversation

@majiayu000
Copy link
Contributor

Summary

  • Add enable_warmup parameter to HNSW and DiskANN embedding servers to pre-load model at startup
  • Implement warmup() method on LeannSearcher for manual pre-warming before first search
  • Add auto-warmup option during LeannSearcher initialization (enable_warmup=True)
  • Model warmup is done via dummy embedding computation, ensuring cache is hot
  • Add comprehensive tests for warmup functionality

Test Plan

  • Added unit tests in tests/test_warmup.py
  • Tests cover warmup method, auto-warmup, server parameter passing
  • All Python files pass syntax validation

Fixes #177
Fixes #159

- Add enable_warmup parameter to HNSW and DiskANN embedding servers
- Implement warmup() method on LeannSearcher for manual pre-warming
- Auto-warmup option during LeannSearcher initialization (enable_warmup=True)
- Pre-load embedding model at server startup to avoid cold-start latency
- Add comprehensive tests for warmup functionality

Fixes yichuan-w#177 (search recompute latency)
Fixes yichuan-w#159 (warmup strategy)
@yichuan-w yichuan-w requested a review from andylizf December 24, 2025 08:43
@yichuan-w
Copy link
Owner

yichuan-w commented Dec 24, 2025

Thanks, this is a known issue for a long time, we will look into that!! cc @andylizf , and can you fix the lint error here?

- Remove unused imports (tempfile, Path, MagicMock)
- Fix import order (stdlib before third-party)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@majiayu000
Copy link
Contributor Author

The macOS-13 CI jobs show as 'cancelled' rather than 'failed' - this appears to be a GitHub Actions runner issue, not a code problem. All other builds (macos-14, macos-15, ubuntu) passed successfully.

Could you please re-run the cancelled macOS-13 jobs?

@yichuan-w
Copy link
Owner

yichuan-w commented Dec 29, 2025

Sure, I will do that later, sorry for the late responses since I was on vacation. And thanks again for your contribution!

@majiayu000
Copy link
Contributor Author

No worries, thanks for re-running the CI! Let me know if there's anything else that needs to be addressed.

@andylizf
Copy link
Collaborator

Thanks for implementing this warmup feature! The functionality looks good and solves a real latency problem.

A few suggestions for potential future improvements (not blocking for this PR):

1. Extract common warmup logic

The warmup code in hnsw_embedding_server.py and diskann_embedding_server.py is duplicated. Consider extracting to a shared module:

# leann/warmup.py
def warmup_embedding_model(model_name: str, embedding_mode: str, provider_options=None) -> float:
    """Pre-load embedding model by computing a dummy embedding."""
    ...

2. Clarify enable_warmup semantics

Currently enable_warmup in LeannSearcher controls two things:

  • Server-side model preloading
  • Client-side dummy query

Consider separating these concerns in a future refactor.

3. Avoid calling private methods

LeannSearcher.warmup() calls self.backend_impl._ensure_server_running(). Consider exposing a public interface for this.


These are minor design notes - the PR is good to merge as-is. Nice work! 🎉

@andylizf
Copy link
Collaborator

What do you think? @yichuan-w

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants